Search | VHL Regional Portal

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.

Varadi, Mihaly; Bertoni, Damian; Magana, Paulyna; Paramval, Urmila; Pidruchna, Ivanna; Radhakrishnan, Malarvizhi; Tsenkov, Maxim; Nair, Sreenath; Mirdita, Milot; Yeo, Jingi; Kovalevskiy, Oleg; Tunyasuvunakool, Kathryn; Laydon, Agata; Zídek, Augustin; Tomlinson, Hamish; Hariharan, Dhavanthi; Abrahamson, Josh; Green, Tim; Jumper, John; Birney, Ewan; Steinegger, Martin; Hassabis, Demis; Velankar, Sameer.

Nucleic Acids Res ; 52(D1): D368-D375, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37933859

ABSTRACT

The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.

The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community.

Subject(s)

Artificial Intelligence , Protein Structure, Secondary , Proteome , Amino Acid Sequence , Databases, Protein , Search Engine , Proteins/chemistry

3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources.

Varadi, Mihaly; Nair, Sreenath; Sillitoe, Ian; Tauriello, Gerardo; Anyango, Stephen; Bienert, Stefan; Borges, Clemente; Deshpande, Mandar; Green, Tim; Hassabis, Demis; Hatos, Andras; Hegedus, Tamas; Hekkelman, Maarten L; Joosten, Robbie; Jumper, John; Laydon, Agata; Molodenskiy, Dmitry; Piovesan, Damiano; Salladini, Edoardo; Salzberg, Steven L; Sommer, Markus J; Steinegger, Martin; Suhajda, Erzsebet; Svergun, Dmitri; Tenorio-Ku, Luiggi; Tosatto, Silvio; Tunyasuvunakool, Kathryn; Waterhouse, Andrew Mark; Zídek, Augustin; Schwede, Torsten; Orengo, Christine; Velankar, Sameer.

Gigascience ; 112022 11 30.

Article in English | MEDLINE | ID: mdl-36448847

ABSTRACT

While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.

Subject(s)

Metadata , Records , Amino Acid Sequence , Databases, Protein , Computer Simulation

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.

Varadi, Mihaly; Anyango, Stephen; Deshpande, Mandar; Nair, Sreenath; Natassia, Cindy; Yordanova, Galabina; Yuan, David; Stroe, Oana; Wood, Gemma; Laydon, Agata; Zídek, Augustin; Green, Tim; Tunyasuvunakool, Kathryn; Petersen, Stig; Jumper, John; Clancy, Ellen; Green, Richard; Vora, Ankur; Lutfi, Mira; Figurnov, Michael; Cowie, Andrew; Hobbs, Nicole; Kohli, Pushmeet; Kleywegt, Gerard; Birney, Ewan; Hassabis, Demis; Velankar, Sameer.

Nucleic Acids Res ; 50(D1): D439-D444, 2022 01 07.

Article in English | MEDLINE | ID: mdl-34791371

ABSTRACT

The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set.

Subject(s)

Databases, Protein , Protein Folding , Proteins/chemistry , Software , Amino Acid Sequence , Animals , Bacteria/genetics , Bacteria/metabolism , Datasets as Topic , Dictyostelium/genetics , Dictyostelium/metabolism , Fungi/genetics , Fungi/metabolism , Humans , Internet , Models, Molecular , Plants/genetics , Plants/metabolism , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Proteins/genetics , Proteins/metabolism , Trypanosoma cruzi/genetics , Trypanosoma cruzi/metabolism

Highly accurate protein structure prediction for the human proteome.

Tunyasuvunakool, Kathryn; Adler, Jonas; Wu, Zachary; Green, Tim; Zielinski, Michal; Zídek, Augustin; Bridgland, Alex; Cowie, Andrew; Meyer, Clemens; Laydon, Agata; Velankar, Sameer; Kleywegt, Gerard J; Bateman, Alex; Evans, Richard; Pritzel, Alexander; Figurnov, Michael; Ronneberger, Olaf; Bates, Russ; Kohl, Simon A A; Potapenko, Anna; Ballard, Andrew J; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Clancy, Ellen; Reiman, David; Petersen, Stig; Senior, Andrew W; Kavukcuoglu, Koray; Birney, Ewan; Kohli, Pushmeet; Jumper, John; Hassabis, Demis.

Nature ; 596(7873): 590-596, 2021 08.

Article in English | MEDLINE | ID: mdl-34293799

ABSTRACT

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Subject(s)

Computational Biology/standards , Deep Learning/standards , Models, Molecular , Protein Conformation , Proteome/chemistry , Datasets as Topic/standards , Diacylglycerol O-Acyltransferase/chemistry , Glucose-6-Phosphatase/chemistry , Humans , Membrane Proteins/chemistry , Protein Folding , Reproducibility of Results

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL